Parse Postgres's LOCK TABLE statement #1614

freshtonic · 2024-12-20T05:27:56Z

See: https://www.postgresql.org/docs/current/sql-lock.html

PG's full syntax for this statement is supported:

LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

where lockmode is one of:

    ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE
    | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE

It is implemented to not intefere with the roughly equivalent (but different) syntax in MySQL, by using a new Statement variant.

iffyio

Thanks @freshtonic!

iffyio · 2024-12-21T13:14:51Z

src/parser/mod.rs

+        let projection = if dialect_of!(self is PostgreSqlDialect | GenericDialect)
+            && self.peek_keyword(Keyword::FROM)
+        {
+            vec![]
+        } else {
+            self.parse_projection()?
+        };


we can probably skip these changes in this PR given it's now in #1613?

@iffyio oh my bad - I should have branched from the apache main branch instead of our fork's main before pushing. I'll remedy this.

@iffyio I tried this and it's not straightforward without storing a value on the variant the identifies the dialect that was used to parse the AST.

The following syntax would be problematic (to render, in Display):

LOCK customers;

In PG, the TABLE keyword is optional. In MySQL one of TABLE or TABLES is mandatory.

The Display impl for Statement, in the LockTable { .. } match arm could potentially generate SQL that will not be parsable by Postgres if a TABLES keyword is emitted.

Is there precedent for choosing how to render an AST fragment using a stored value to encode the dialect (or a proxy to the dialect) that was used to parse the AST?

ah yeah we could probably use an enum to represent the variants, something like?

enum LockTableKind { TABLE TABLES } Statement::LockTable { table_kind: Option<LockTableKind> }

see TableSampleKind for example

src/ast/mod.rs

freshtonic · 2025-01-04T12:57:03Z

@iffyio I've pushed another attempt at this.

Munging the Postgres and MySQL versions together into the same Statement::LockTables { .. } variant was painful due to requiring additional fields that could be relied upon for a correct Display implementation which would produce a valid statement for both MySQL and Postgres.

In my opinion, introducing a LockTables enum (with two variants for Postgres & MySQL and which is marked non_exhaustive in order to support other variations in the future) is less of a cognitive burden than inlining all variations as lots of optional fields on the same struct.

I should point out that this is a breaking change to the Statement::LockTables variant (the variant is now tuple-style with a single LockTables field instead of being a struct-style variant.

Due to it being a breaking change and directly signalling DB dialect in the AST not appearing to be an idiom that is used elsewhere I don't have much confidence this PR will be accepted but one can hope :)

iffyio · 2025-01-05T11:06:11Z

src/ast/mod.rs

    /// ```sql
    /// UNLOCK TABLES
    /// ```
    /// Note: this is a MySQL-specific statement. See <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html>
-    UnlockTables,
+    UnlockTables(bool),


Suggested change

UnlockTables(bool),

UnlockTables(UnlockTables),

maybe we use a dedicated struct here as well? Its not clear what the bool property implies otherwise

iffyio · 2025-01-05T11:18:34Z

src/ast/mod.rs

-pub struct LockTable {
-    pub table: Ident,
+#[non_exhaustive]
+pub enum LockTables {


I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects. Representation wise, I think both variants can be merged into a struct with something like the following?

struct TableLock { pub table: ObjectName, pub alias: Option<Ident>, pub lock_type: Option<LockTableType>, } struct LockTable { pluralized_table_keyword: Option<bool>, // If None, then no table keyword was provided locks: Vec<TableLock>, lock_mode: Option<LockTableType>, only: bool, no_wait: bool }

similarly, the parser uses the same impl to create the struct

I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects

MySQL's & Postgres's LOCK statements have minimal overlap. They are similar in name only.

MySQL allows different lock modes per table vs Postgres one lock mode applied to all tables

MySQL's lock modes and Postgres's lock modes do not overlap at all

MySQL has freely interchangeable table keywords, one of which MUST be present: TABLE or TABLES

Postgres has one TABLE keyword but it's optional

Postgres supports additional (optional) ONLY and NOWAIT keywords

It is never valid to mix and match the Postgres-specific syntax with MySQL-specific syntax.

I agree that explicit database-specific AST fragments make parser reuse more difficult but in the case of LOCK pretty much nothing is reusable.

The db-specific AST pieces do make implementing Display a lot less error prone.

It also makes it (almost) impossible be able to represent invalid AST (e.g. a mix of PG & MySQL) except I didn't take a hard stance on this for LockTableType which does mix MySQL & Postgres bits.

None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.

None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.

What I mean by this:

When pieces of dialect-specific syntax are mixed in the same AST struct/enum variant the consumers have to understand the dialect-specific differences in order to know which fields they can safely ignore.

At CipherStash I wrote a type-inferencer for SQL statements in order to determine if specific transformations can be performed safely. It uses sqlparser's AST and there are a lot of cases where I had to spend time understanding which AST node fields or combinations of field values I can safely ignore when I'm only targeting Postgres.

Yeah I agree that downstream crates targeting a single dialect would be easier to implement by essentially having dialect specific AST representations (on the other extreme there are downstream crates that would like to process the AST in a dialect agnostic manner,, we also have custom dialects in other downstream crates that need support). I think there are pros/cons to this approach vs the current one followed by the parser which puts some of the responsibility on the downstream crate. I'm thinking in any case ideally we would want to keep to the current approach for the PR while shift in approaches could be tackled as its own dedicated proposal.

iffyio · 2025-01-05T11:21:03Z

src/ast/mod.rs

 #[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
 #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
 #[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
+#[non_exhaustive]


Suggested change

#[non_exhaustive]

I think we tend to not use this attribute, I think there are pros/cons with using it but better to keep with the existing convention in this PR

See: https://www.postgresql.org/docs/current/sql-lock.html PG's full syntax for this statement is supported: ``` LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] where lockmode is one of: ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE ``` MySQL and Postgres have support very different syntax for `LOCK TABLE` and are implemented with a breaking change on the `Statement::LockTables { .. }` variant, turning the variant into one which accepts a `LockTables` enum with variants for MySQL and Posgres.

freshtonic force-pushed the james/cip-1063-add-lock-table-support-in-sqlparser branch from 7ce8f33 to 47a3e5e Compare December 20, 2024 05:38

iffyio reviewed Dec 21, 2024

View reviewed changes

freshtonic force-pushed the james/cip-1063-add-lock-table-support-in-sqlparser branch from 47a3e5e to cd9919b Compare January 4, 2025 12:43

iffyio reviewed Jan 5, 2025

View reviewed changes

freshtonic force-pushed the james/cip-1063-add-lock-table-support-in-sqlparser branch from cd9919b to d5cde4f Compare January 5, 2025 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse Postgres's LOCK TABLE statement #1614

Parse Postgres's LOCK TABLE statement #1614

freshtonic commented Dec 20, 2024

iffyio left a comment

iffyio Dec 21, 2024

freshtonic Dec 21, 2024

freshtonic Dec 23, 2024 •

edited

Loading

iffyio Dec 24, 2024

freshtonic commented Jan 4, 2025 •

edited

Loading

iffyio Jan 5, 2025

iffyio Jan 5, 2025

freshtonic Jan 5, 2025

freshtonic Jan 5, 2025

iffyio Jan 8, 2025

iffyio Jan 5, 2025

Parse Postgres's LOCK TABLE statement #1614

Are you sure you want to change the base?

Parse Postgres's LOCK TABLE statement #1614

Conversation

freshtonic commented Dec 20, 2024

iffyio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freshtonic Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freshtonic commented Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freshtonic Dec 23, 2024 •

edited

Loading

freshtonic commented Jan 4, 2025 •

edited

Loading